Picture for Wenpeng Xing

Wenpeng Xing

TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

Add code
May 31, 2026
Viaarxiv icon

Detecting Is Not Resolving: The Monitoring Control Gap in Retrieval Augmented LLMs

Add code
May 26, 2026
Viaarxiv icon

Composition Collapse: Stable Factual Knowledge Does Not Imply Compositional Reasoning

Add code
May 26, 2026
Viaarxiv icon

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Add code
May 26, 2026
Viaarxiv icon

Cordon-MAS: Defending RAG against Knowledge Poisoning via Information-Flow Control

Add code
May 26, 2026
Viaarxiv icon

Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation

Add code
Apr 09, 2026
Viaarxiv icon

LatentAudit: Real-Time White-Box Faithfulness Monitoring for Retrieval-Augmented Generation with Verifiable Deployment

Add code
Apr 07, 2026
Viaarxiv icon

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

Add code
Apr 07, 2026
Viaarxiv icon

ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation

Add code
Jan 20, 2026
Viaarxiv icon

ForgetMark: Stealthy Fingerprint Embedding via Targeted Unlearning in Language Models

Add code
Jan 13, 2026
Viaarxiv icon